The aim was to prepare a spatio-temporal representation of valuation studies related to biodiversity and ecosystem services and … .
To identify country names in the corpus of literature a two step approach was used. First, we wanted to understand where studies were conducted and searched the title, abstract, and keywords of each paper for country names. Second, to understand where the funding institutions were located we searched the affiliations, acknowledgments, and funding text for country names.
The input data we used are the following:
Bib file downloaded from Web of Science
ISO 3166-1 alpha-3 country codes (https://www.iso.org/iso-3166-country-codes.html)
IPBES regional and subregional area dataset (https://doi.org/10.5281/zenodo.3923633)
The python code used to georeference the corpus can be found here. An overview of the pipeline is provided in the following schematic and described below.
knitr::include_graphics("pilot2.svg")
Overview of the process of Georeferencing the corpus of valuation studies
Step 1: Extract country names from text Country names were extracted from the title, abstract, and keywords of each paper with a regular expression and the associated ISO code was added into a a column in the dataset. The same regular expression was also used to search the affiliations, acknowledgments, and funding text of the same paper and placed into a second column.
Step 2 and 3: Bundle countries in regions The IPBES Regions and Subregions datatset was then used to add additional region and subregion attributes to the dataset by matching the ISO3 code.
Step 4: Find TS accordingly Finally, we used a set of files to add additional attributes to the dataset that identified the topics. The set of files contained identifying information for papers derived from sets of web of science searches targeting particular topics. This identifying information was then matched to the corpus, and the topic extracted.
Finally, the complete corpus with the added attributes of country ISO codes of both funding institutions and research locations, and topic identification were used as the basis of the rest of the research project.
The IPBES Core Indicators were used alongside a chosen set of other relevant indicators to understand geographic trends between the density of studies and density of institutes.
We used all the most recent year of the IPBES Core Indicators available within the country dataset except for two indicators, Countries/Regions with Active NBSAP and Category 1 nations in CTIES, as these are binary in the dataset and would not be compatible with the following analysis. We selected a specific category from the indicators with multiple categories. For example, for the indicator “Area of forest production under FSC and PEFC certification” we chose the FSC certification area and not the PEFC certification area.
Here is the table of all of the IPBES Core Indicators used, the category selected, the year the data is from, and the number assigned to them.
| Name | Category | Year | Number |
|---|---|---|---|
| Area of forest production under FSC and PEFC certification | FSC_area | 2016 | 1 |
| Biodiversity Habitat Index | Average | 2014 | 2 |
| Biodiversity Intactness Index | Value | 2005 | 3 |
| Biocapacity per capita | Value - Total | 2012 | 4 |
| Ecological Footprint per capita | Value - Total | 2012 | 5 |
| Forest area | Forest area (1000ha) | 2015 | 6 |
| Water Footprint | Water Footprint - Total (Mm3/y) | 2013 | 7 |
| Inland Fishery Production | Capture | 2015 | 8 |
| Region-based Marine Trophic Index | 1950 | 2014 | 9 |
| Nitrogen + Phosphate Fertilizers | N total nutrients - Consumption in nutrients | 2014 | 10 |
| Nitrogen Use Efficiency (%) | Nitrogen Use Efficiency (%) | 2009 | 11 |
| Percentage and total area covered by protected areas | Terrestrial - Protected Area (%) | 2017 | 13 |
| Percentage of undernourished people | Prevalence of undernourishment (%) (3-year average) | 2015 | 15 |
| Proportion of local breeds, classified as being at risk, not-at-risk or unknown level of risk of extinction | At Risk of Extinction | 2016 | 16 |
| PA of Key Biodiversity Areas Coverage (%) | Estimate | 2016 | 17 |
| Protected area management effectiveness | PA Assessed on Management Effectiveness (%) | 2015 | 18 |
| Protected Area Connectedness Index | Protected Area Connectedness Index | 2012 | 19 |
| Species Habitat Index | Species Habitat Index | 2014 | 20 |
| Species Protection Index (%) | Species Protection Index (%) | 2014 | 21 |
| Species Status Information Index | Value | 2014 | 22 |
| Total Wood Removals (roundwood, m3) | Total | 2014 | 23 |
| Trends in forest extent (tree cover) | Percentage of Tree Cover Loss | 2015 | 24 |
| Nitrogen Deposition Trends (kg N/ha/yr) | Nitrogen Deposition Trends (kg N/ha/yr) | 2030 | 25 |
| Trends in Pesticides Use | Use of pesticides (3-year average) | 2013 | 26 |
There were a few instances of duplicated values which were double checked with the original dataset and the erranous value removed. Examples include having two values for USA due to the separation of Hawaii in the original dataset. In these cases Hawaii was removed and the value referring to the rest of the states of the country was used instead. Additionally, Indicator 9, Region-based Marine Trophic Index, the mean of the regions was calculated per country, as countries such as Germany have multiple regions with distinct values.
A set of other indicators were included in the analysis to expand the coverage of socioeconomic variables. We included the human development index (HDI), average harmonized learning outcomes score, gross domestic product (GDP), corruption perception index (CPI), and population.
These datasets were downloaded, cleaned, and had ISO3 codes added to easily merge them into the analysis. The latest data available was used for each indicator.
To understand how valuation is spread across geographies, we counted the number of times each country’s ISO code appeared in the corpus for both geography columns added in step 2. The result is the density of studies per country and the density of funding institutions per country for the entire corpus.
The external indicators were also joined onto the dataset to analyze the relationships between these socioeconomic indicators and the density of studies and funding institutions.
The uptake dataset was filtered based on a set of queries listed and explained below. We also calculated the number of times each country’s ISO code appeared with the filtered dataset and joined the external indicators.
This process was also repeated with an additional filter that excluded any studies published before 2010.
The trend between each indicator and the density of studies and funding institutions are shown here for the entire corpus.
knitr::include_graphics("Outputs/Corpus/Correlation25_Names1.png")
IPBES Core Indicators vs. Density of studies
knitr::include_graphics("Outputs/Corpus/Correlation25_Names1_log.png")
IPBES Core Indicators vs. Log density of studies
knitr::include_graphics("Outputs/Corpus/Correlation25_Names2.png")
IPBES Core Indicators vs. Density of funding institutions
knitr::include_graphics("Outputs/Corpus/Correlation25_Names2_log.png")